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ABSTRACT 

A method for improving curriculum and schools through 
the local development of competency tests in basic skills — the 
Competency-Rossville Model (CRM) — is outlined. The method was 
originated in the school system of Rossville (Illinois) and has been 
tested in five other midwestern school systems. The approach leads 
the faculty of the school, with the guidance of a measurement 
consultant, in the development of a series of grade-level tests to 
measure mastery in basic skills achievement. This method of test 
development serves to articulate the curriculum; provides a useful, 
relevant, and appropriate achievement testing program; and provides a 
management system for the improvement of basic skills instruction. 
The CRM is compared favorably to the norm referenced testing model. 
The CRM program includes: (1) inservice instruction of faculty in an 
alternate model of evaluation of academic achievement; (2) a 
committee of school faculty formed around use of the Delphi method to 
confer with their fellow teachers; (3) grade-level representatives in 
charge of informal committees for each grade level; (4) informal 
teacher conferences, involving each grade level, to establish a list 
of skills to be mastered by students; (5) development of test items; 
and (6) computer-assisted interpretation of tests. This type of 
inservice development project has been successful in increasing 
faculty morale, improving basic skills instruction, and improving 
school achievement testing programs. The projrvt involves faculty 
actively in curriculum development, results ii valid and reliable 
tests, and provides information that is valuable and useful to 
teachers. Nine figures are provided, and a sample skill-referenced 
math test is appended. (TJH) 
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ABSTRACT 



Many articles and several books have been written which 
desribe the shortcomings of the classical psychometric 
approach to ach ievement test ing. 

Among the complaints concerning testing is that the 
mtedsures are not val id or appropriate for the kinds of 
learning that is happening in schools today. Such tests are 
said to be biased, and the information that they provide is 
not useful because it is often f i led and forgotten. 

This presentation describes a tried and proven method 
for improvinig schools through the local development of 
competency testing in the basic skills. The method was 
originated in the school system of Rossville, Illinois and 
has been tested in five other midwestern school systems. 

The plan is to lead the faculty of the school with the 
guidance of a measurement consultant in the development of a 
series of grade level tests to measure mastery in basic 
skills achievement.. This method of test development serves 
to articulate the curriculum, to provide a useful, relevant, 
and appropriate achievement testing program, and to provide a 
management system for the improvement of basic skill 
instruct ion. 

The program beg ins with the inservice instruct ion of 
faculty by teaching them an alternate model of the evaluation 
of educational achievement- A committee of school faculty is 
formed and then utilizes the Delphi method to confer with 
their fellow teachers. Grade level representatives are in 
charge of an informal committee for their respective grade 
level. Teachers confer informally within each grade level 
and establ ish a 1 ist of ski 1 Is that they agree students 
should have mastered at the end of the grade level they are 
teaching. These skills lists are in the form of just the 
behavior part of the behavioral objectives for that grade 
level * 

Test items are developed to test the skills. The test 
development phase of the project utilizes item pools. 
Items are developed from the item pool to form competency 
tests in the basic skills at each grade level. Test 
questions are designed with the desired objective that at 
least 70 per cent of the students will answer at least 78 
per cent of the items correctly for each of the skills that 
are measured. When this goal is not met, each unmastered 
skill is examined to determine whether the test item, the 
objective, or the instruction needs to be improved for the 
next class at that grade level. 

Tests are scored and results are interpreted by 
computer. Reports of each student's progress are provided to 
each teacher and to the child's parents. 

This form of test development has the following 
advantages: 



1. Faculty are act ively involved in curriculum 
development. 

£. The tests are valid, reliable, and appropriate. 

3. The tests provide information that is valuable 
and useful to teachers rather than just filed 
forgotten, the way the results of standardised 
tests are. 

Teachers ent husiast ical ly support this method of 
testing. Parents support the wealth of information that the 
reports they receive provide about their chi ldren' s progress 
in school. Of the six school systems that have attempted 
th is program, all are presently cont inuing it and most are 
planning on expanding it. 

Th is type of inservice development project has been 
successful where it has been tried in increasing faculty 
morale, improving basic ski 1 1 instruct ion, and improving the 
achievement testing program in the school. Attempts will be 
made to extend the program to school subjects other than 
the basic skills and to provide additional benefits within 
the program. 
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Improving Schools through Inservice Test 



Construction: The Rossville Model 



by David Alan Bilman, Ph.D. 
Indiana State University 



Education has its share of good guys (programs which generally 
merit positive comments) such as the school lunch program, basic skills 
instruction, and gi fted-talented programs. However, there is also an 
adequate supply of bad guys (programs which draw more blarne than 
praise). Among these are merit pay, vandalism, poor discipline, and 
st and ard i z ed t est s. 

Although it is recognized that teachers could utilize the 
information that achievement testing provides, there has been a 
constant barrage of criticisms directed toward the classical 
psychometric model as it is applied to educational achievement testing. 
This model for testing provides a basis for the simple ranking of 
students from high to low. The criticisms have not been directed 
toward the capability of the model to accomplish such a sorting, but 
rather have been directed toward the educational outcomes that occurs 
as a result of such rankings. For some time now, educators have been 
questioning the amount of time that schools spend ordering and sorting 
students from high to low according to their various abilities. 

The critics of this type of testing have been so vocal that they 
have succeeded in having all standardized intelligence testing removed 
from the New York City Public Schools and their protests has caused the 
National Education Association to recommend a complete moratorium on 
all standardized testing in U. S. schools. 

Table I contains a summary of the criticisms that have been 
directed toward the classical psychometric model as it is applied in 
norm referenced achievement tests. 

Table I 



Criticisms of the Classical Psychometric Approach to Achievement 

Testing 



Substant ive Issues 



Humanist ic Issues 



Inval id 
Biased 

Not useful to teachers 

Tests define the curriculum 

Fi led and forgotten 

Inaccurate 

Wastes time 

Misunderstood 

Results are un interpret able 
No management system 



Degrading 

Labels students 

Destruct i ve compet i t ion 

Promotes dishonesty 

Impersonal 

Unfair 

Puts pressure on students 
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Critics of standardized testing claim that such tests are 
invalid because skills measured on these tests are often not the ones 
taught in the classroom. 

Although test publishers try to combat it, their tests are 
constantly criticized for being culturally biased against various 
minority groups. 

The results of these tests only serve to rank students from high 
to low and consequently the results are not of any particular use to 
teachers or to school administrators in helping students overcome their 
specific learning difficulties. 

Some educators believe that these tests serve to influence their 
school's curriculum in ways that infringe on the autonomy of the 
faculty and/or the local school board. The content of standardized 
tests is determined in such places as Iowa City, Iowa or Princeton, New 
Jersey and does, in some instances, exert a direct influence on what is 
taught in local school districts. 

Since most teachers do not understand the intricate contingen- 
cies that are involved in the classical psychometric model of testing, 
they do not understand how the tests are to be utilized and do not know 
how the results are interpreted. 

The scores of standardized tests are derived in such a way as to 
indicate a rather abstract relationship between each student's level of 
performance and the normal curve. Most teachers have never mastered 
the understanding of what these scores are trying to tell them. It is 
probably fair to say that some teachers do not possess the mathematical 
ability to analyze these results in a way that would cause them to 
benefit from what the scores are trying to tell them. 

The process of standardized testing has no accompanying 
instructional management system that can direct educators to what can 
be done to solve the specific problems of an individual student. 

Humanistic issues. Testing is said to be degrading because the 
constant threat of failure causes low achieving students to lose self 
esteem since they constantly expect to receive yet another low score 
each time they are tested. 

Since students constantly compete to outdo each other in order 
to obtain a higher score than their fellow classmates, they enter into 
what psychologists refer to as destructive competition. 

Standardized tests brand students with labels that cause their 
teachers to identify them as a 4. 1 in reading, a 78 I. Q. , learning 
disabled, or borderline retarded. In discussions among teachers, these 
labels are used frequently when referring to specific students. 

Because tests provide no escape for those who have special 
temporary or persistent learning difficulties or have inadequate test 
taking aptitudes, tests are said to be impersonal. 

Because of biases of tests, because of their impersonal nature, 
and because of the varying degree of test wiseness among students, 
tests are said to be unfair. 

Tests cause students to be anxious and concerned about their 
performance in relation to other students and thus tests cause students 
to be pressured. 

This list has provided a comprehensive although probably not a 
totally inclusive list of the criticisms of standardized tests. It is 
surprising that more has not been done to promote an alternate 
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model of educational evaluation. Although the criterion referenced 
test model (CRT) was proposed a few years ago, it has received limited 
acceptance because of the lack of understanding of the model by 
teachers and by testing experts alike. This has brought about a 
disagreement among measurement specialists as to the intent and purposes 
of criterion referenced measurement. However, a new and different 
approach to the measurement of basic skills achievement has evolved and 
has been tried in six rural school districts in western Indiana and 
eastern Illinois. It is not the currently accepted CRT model that has 
evolved, but rather is a model that the originators of CRT envisioned. 
For purposes of distinguishing the model under discussion with CRT, the 
model described here will be referred to as the Competency-Rossvi 1 le 
Model or CRM. 



The Trouble With Behavioral Objectives 

Since i960, behavioral objectives have been utilized by some 
educators as a tool to specify test content, validate tests, and to 
articulate curricula. Behavioral objectives require three components 
to be specified in each objective. These are: 

1. A behavior (something the learner must do to show that 
learning has occurred), 

£. Conditions (what the learner will be provided or denied 
in the test situation). 

3. A criterion (a minimum standard of iptable 
performance) • 

Because of the specificity required to articulate three parts 
for each objective and the amount of verbal material that specification 
of objectives for each grade level requires, books of objectives are 
thick, cumbersome, and awkward to use. When teachers bring their 
behavioral objectives to workshop sessions, it is amusing to watch them 
blow the accumulated dust from the covers of their objectives books so 
that they will not get themselves dusty when they use them. 

fin approach that has proven to be more beneficial is the 
development of skills lists. These are lists of skills for each 
subject tested at each grade level that teachers expect students to 
have mastered. Each item from these skills lists is just the behavior 
part of a behavioral objective. 

Whereas a behavioral objective might be presented as: 

When provided with a list of thirty long division problems 
and without the use of a calculator, the student will solve at 
least twenty four of them correctly. 

The item from a skills list would appear as: 

Solves long division problems 
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Although such verbs as "to know' 1 , "to understand", and "to 
appreciate" are usually not permitted in behavioral objectives, these 
words are permitted in skills lists. 

Because of the economy of wording, teachers may have at their 
desks laminiated copies of the lists of skills that students are 
expected to master at their grade level. Teachers can refer to these 
lists as often as they need to and are constantly reminded of the 
basic skills curriculum. 

One wrinkle with using the skills list approach is that there is 
no chance to specify different criteria for the various skills- Rather 
all objectives are specified and tests are designed so that the same 
agreed upon percentage of correctly answered skills test items serves 
as the criterion for each grade level. For purposes of these tests, 
the agreed upon percentage has been a criterion of 70 percent of the 
items answered correctly. 

Copies of skills lists are provided as cover pages for the 
tests. Expamples of the tests and skills lists are contained in the 
Appendix of this report. 



Norm Referenced Model (NRM) versus Competency-Rossvi 1 le Model (CRM) 

Any testing model has a philosophy which provides the basis for 
the procedures that are to be followed in the evaluative phase of 
instruction. Both NRM and CRM are supported by their respective 
philosophies. Furthermore, the philosophies of these two varieties of 
measurement are so fundamentally different that it seems virtually a 
coincidence that both are categorized as "educational measurement" and 
that the instruments of each are referred to by the same name "test". 

In order for one to understand CRM, it is convenient to contrast 
it with the more common type of testing which is NRM. 

A competency level or criterion in CRM is a standard of 
performance which serves as a minimum level to be used in a decision- 
making process. The competency level in CRM is the minimum score or 
rate that can be considered as an acceptable performance or as a 
minimal passing score. 

In figure 1, the minimum standard of acceptable performance (the 
criterion) is that the student can answer 90% of the items correctly. 
Student P answered 95% of the items correctly. Since this score is 
above the criterion, Student P passed the test. Student F answered 
only 75% of the items correctly. His score is below criterion and thus 
Student F did not pass the test. 



Figure 1 goes about here. 



A norm may be thought of as an average. The mean, median, and 
mode are all examples of norms. Some of the types of scores derived 
from norm referenced information are percentiles, grade equivalent 
scores, age equivalent scores, I . Q scores, standard scores, and 
stanines. To obtain these types of scores for any student, it is 
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CRITERION: MINIMUM STANDARD 

OF ACCEPTABLE 
PERFORMANCE 



100% 

STUDENT P's SCORE = 95 

(above criterion) 
-90% CRITERION 



80% 

STUDENT F's SCORE = 70 
(below criterion) 

70 % 



60 % 



FIGURE 1 

SCORES ON A CRITERION REFERENCED TEST 
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necessary to obtain the mean or some other type of norm for the group 
that the student belongs to. Frequently the relative distance a 
student scores from the mean is measured in units of standard 
deviations. ft standard score of -1.0 means the student's score is one 
standard deviation below the mean while a standard score of +£. 1Z1 
indicates the student's score is £. <D standard deviations above the mean. 
(Ses Figure 2) . 



Figure £ goes about here. 



Norm referened tests are used to find out how each individual 
performs in relationship to the performance of other individuals who 
have taken the same test- The meaning of a norm referenced test score 
is derived from its comparison to the norm or average and consequently 
with it comparison to the scores of other students. ftlmost all 
classroom tests and standardised intelligence tests are norm referenced 
measures. Because of the fact that they measure student's degree of 
learning relative to the degree of learning cf others and relative to 
the normal curve, they are sometime referred to as relative tests. 

CRM is one. example of what can be called an absolute form of 
testing. Absolute interpretation of test scores involves making a 
judgment about the score of a student in terms of how his unique 
individual performance on a test relates to a minimum standard. 
However, recently a great amount of attention has been devoted to 
absolute measurement by practitioners in a variety of areas. 

An absolute interpretation of test scores is advocated in such 
diverse fields as individualised instruction, programmed instruction, 
computer-assisted instruction, non-graded schools, governmental and 
military education, performance based education, the systems approach 
to education, minimum competency testing, early childhood education, 
the British open school, competency based education, special education, 
and physical education. 

CRM focuses" attention on whether students are able to do certain 
tasks acceptably. It is because the learner is being compared to some* 
established standard, rather than to other individuals, that causes 
these measures to have educational value. The meaningf ulness of any 
learner's score is jQgJ dependent on any comparison with scores of other 
learners. 



CRM, Behavioral Objectives, and Skills Lists 

The process of absolute testing has been closely tied to stating 
goals of instruction in behavioral objectives. However, as has been 
stated earl ier in this report, behavioral object ives are frequent ly 
awkward to use. The CRM Model reported here utilizes only the 
behavioral part of behavioral objectives. These behaviors are listed 
in what are called skills lists* 

Minimum standards vary depending on the task and its desired 
degree of attainment. Figure 3 shows some of the criterion levels that 
may be specified for various performances. 
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FIGURE 2 

A DISTRIBUTION OF TEST SCORES WITH A MEAN 
OF 90 AND A STANDARD DEVIATION OF 10 
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Figure 3 goes about here. 



An airline pilot will be expected to perform flawlessly on tests 
designed to measure piloting skills, A bright fourth grader may be 
expected to master all of the 100 multiplication facts. However, a 
social studies teacher may expect slow learning students to obtain a 
score of only at least 63% on a semester test. Consequently,' the 
standard is set at 60S. A general education course taught at the 
college level may be taught in such a way that the .instructor will 
consider that students have mastered the material if they score higher 
than the criterion of 90S. 

A frequently specified minimum standard is 70S. When a teacher 
sets up objectives for a class T the instruction and the CRM exercises 
are designed and constructed in a way that explicity defines rules 
linking patterns of test performance to the skills lists. If 70S is 
the criterion score, then any student who scares above 70S will be 
considered by the teacher to have learned the material. Students who 
score lower than 70S are considered to be below the desired level of 
mastery. 



A Double Criterion 

Many instructors also use CRM to enable them to ascertain 
whether they are doing an effective job in teaching their classes by 
specifying a double criterion. The double criterion specifies the 
level of performance expected by each student in the class and also 
specifies the number of students that should meet this standard in 
order for the instructor to consider the instruction to be successful. 

The double criterion specified in these tests is the 70-70 
criterion. The 70-70 criterion means that the teacher will consider 
his/her work to be effective if 70S of the students are able to obtain 
a score of at least ^0* on the test. Any student who scores above 70S 
will be considered as having satisfactorily mastered the material. If 
70S or more of the students score above this minimum level, the 
instruction is considered to have been satisfactory. 

The choice of the level of the criterion or the levels of the 
double criterion is usually determined by the instructor and is 
determined by the level of competency of the students, the importance 
of the task, and the level of the instructor's aspirations. However, 
i« most educational circumstances, a reasonable and challenging gnal 
for any instructional setting is the 70-70 criterion. 



Steps in Constructing CRM Tests 

The sequence for constructing NRM is typically to first teach, 
then design a test, and finally to administer it. The step-by-step 
procedure for utilizing CRM is a logical and rational methodology. 
However, some advocates of CRM feel that to follow the steps required 
for the construction of CRM instruments virtually ensures that the 
instruction will be effective. 



SUITABLE CRITERIA 



TEST CRITERIA 

AIRLINE PILOT FLAWLESS 

4TH GRADE MATH FACTS 10C% 

MILITARY TRAINING — 95% 

GRADUATE MEASUREMENT CLASS 90% 

FREE THROW SHOOTING 70% 

SLOW LEARNER, SOCIAL STUDIES 60% 

BASEBALL HITTING .250 

FIGURE 3 



ACCEPTABLE CRITERIA FOR VARIOUS SITUATIONS 

13 
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The steps for constructing CRM are as follows. Before 
instruction begins and before the test is constructed, the desired - 
skills that are to be mastered are carefully specified in the skills 
lists- The situations are created in which performance of the skills 
is to be demonstrated by the students. These sample situations 
constitute the CRM instrumsnt. Next instruction is planned so as to 
accomplish the mastery of the skills. After the instruction has been 
completed, the CRM instruments are administered to find (1) which 
student mastered which skills as demonstrated by their criterion scores 
on the various skills, and (S) whether instruction was accomplished 
effectively as demonstrated by the percentage of students who attain 
the criterion score. 

Although the above sequence represents the sequential pattern 
that occurs in CRM, Figure 4 and Figure 5 represent a more practical 
representation of the sequence of CRM. 



Figure 4 goes about here. 



Figure 4 illustrates what actually occurs in CRM. First the 
objectives are stated in the form of a list of skills to be mastered. 

Next, the test is constructed in such a manner as to determine 
if the student can demonstrate the accomplishment of the behaviors 
described in the skills list. It is interesting to note that in the 
sequence of CRM, test construction is the second step, while in NRM it 
is the next to last step. 

Instruction is then performed in an attempt to master the skills 
on the skills list. Some critics of CRM have faulted this step of the 
procedure by asserting that at this point the instructor is "teaching 
to the test. " It is a matter of individual perception as to whether 
that is happening or whether the skills are being taught, rather than 
the test. It is equally senseless to debate whether there is anything 
inherently wrong with teaching about the concepts that will be 
contained ih test items. 

After the instruction is completed, the CRM instrument is 
administered and scored. There are only two possible scores for each 
skill. Students who score above the criterion pass and those who score 
lower than the criterion do not pass. 

The scores of all students are then evaluated to determine? if 
the instruction was effective. If the desired .percentage -of students 
attain a passing score, the instructor may conclude that students are* 
mastering the skills and that learning is being accomplished 
satisfactorily. If less than the desired percentage of students attain 
criterion, then the instructor must conclude that instruction has not 
been as effective as it was desired to be. The next step is to try to 
reason whether the objectives, the test, or the instruction should be 
changed on the next attempt at teaching the material. 

Figure 5 demonstrates a step-by step procedure for CRM. 



Figure 5 goes about here. 



In figure 5, the criterion levels were not specified because 
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FIGURE 4 

ER?C STEM IN CRM 



1. State objectives. 

2. Prepare CRM instrument to measure objectives. 

3. Teach to accomplish objectives. 

4. Administer and score CRM instrument. 

* ■ * 

5. If any student scored above ; _ %, he has mastered the instruction. 

« 

6. If % of the students score above %, instruction is effective. 

7. Decide if a change is needed in objectives, CRM instrument, or the instruction. 

FIGURE 5 
STEPS IN CRM 
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each instructional situation requires a decision as to the level that 
the learners should attain* 

Figure 6 shows a model of the decision-making process associated 
with CRM and contrasts it with the process traditionally followed in 
NRM. 



Figure 6 goes about here. 



From Figure 6, ic may be observed that there is no attempt made 
in NRM to revise instruction on the basis of the product results as 
measured by the NRM instrument. However, in the CRM process revisions 
occur in either the test, the instruction, or the objectives if results 
indicate that the skills are not being mastered. 



Differences in NRM and CRM 

It was noted earlier in this paper that both CRM and NRM are 
supported by their respective measurement philosophies and that the 
philosophies of the two varieties of measurement are strikingly 
dissimilar. The measurements in CRM and NRM each follow their 
respective measurement philosophies. Some of the differences are noted 
in the paragraphs below and are summarized in Figure 7. 



Figure 7 goes about here. 



Trait or ability to be measured . In NRM, the trait or ability 
to be measured is assumed to be present in varying degrees in different 
individuals. It is the purpose of NRM to order those individuals on a 
continuum ranging from highest to lowest in terms of the amount of that 
trait or ability that the learner possesses. In CRM, the trait or 
ability is assumed to be present in either a sufficient or an 
insufficient amount in different individuals. It is the purpose of CRM 
to separate those idividuals who have attained a prescribed level of 
mastery of the trait or ability from those who have not. 

Previously acquired skills. Furthermore, CRM items are likely 
to be fashioned so that they focus on the measurement of the actual 
instruction, while controlling for or eliminating the measurement of 
previously learned traits, abilities, and prior achievements of the 
examinee. 

Range, of scores . In NRM, the test is designed so that students' 
test scores range from a low which is approximately equal to the chance 
level of the test to a high which may be equal to 100*. 

CRM scores are considered to be passing if the student attains 
the criterion or above and are considered to not be passing if the 
student does not attain the criterion score. CRM scores can only take 
one of two possible values. The two values are variously specified as 
pass-not pass, pass-fail, go-no go, adequate-inadequate or yes-do over. 
The two value scoring of CRM is frequently referred to as producing 
dichotomous data. However, it could be logically argued that 
instruction is most effective when everyone receives the same score of 
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NRM MODEL 



INPUT PRODUCT 

(INSTRUCTION) (NRM RESULTS) 



CRM MODEL 

INPUT PRODUCT — RESULTS 

| (INSTRUCTION) (CRM RESULTS) 1 

NO YES 
OK? 

REVISE INPUT 
FIGURE 6 

DECISION MAKING PROCESS IN NRM AND CRM 
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NRM 

r> — — 

H 

CONTENT 
TESTED 

STUDENT -STUDENT 

ORDERING STUDENTS 

MAXIMIZE MATERIAL 
COVERED 

COGNITIVE 

MEDIUM 

HIGH 

IMPORTANT 
HIGH 

IN VARYING DEGREES 
NUMBER CORRECT 
NORMAL 
PARAMETRIC 



PHILOSOPHY OF TEST ING 

VALIDITY 

PREVIOUSLY ACQUIRED SKILLS 
COMPARISON 
FUNCTION 

INSTRUCTION FOR TEST 
DOMAIN OF INSTRUCTION 
DIFFICULTY 
DISCRIMINATION 
RELIABILITY 
RANGE OF SCORES 
TRAIT MEASURED 
TYPE OF SCORES 
DISTRIBUTION 
STATISTICAL ANALYSIS 

FIGURE 7 

SUMMARY OF DIFFERENCES BETWEEN 



CRM 

CURRICULAR 
NOT CONSIDERED 
STUDENT-CRITERION 
EVALUATING INSTRUCTION 

MAXIMIZE OBJECTIVES 
PSYCHOMOTOR OR COGNITIVE 
EASY 
ZERO 

UNIMPORTANT 
LOW 

GO OR NO GO 
DIC0T0M0US 
RECTANGULAR 
NONPARAMETRIC 

AND CRM 20 
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pass and therefore, there would only be one rectangular distribution. 

Difficulty of items. Most test theorists believe that norm 
referenced test items of medium difficulty will produce the greatest 
discrimination, provide the most information, and will contribute most 
to the reliability of the test. Test experts specify that the besp- 
it ems on a norm referenced test are those for which the number of 
correct responses is approximately half way between chance and 100*. 
This means that for an essay or short answer completion test item, the 
ideal difficulty level would be for only half of the students to 
respond correct ly. 

Neither psychology nor common sense support motivating students 
by asking them questions that only half of them can answer correctly. 

Although the actual difficulty level of CRM instruments depends 
on the ability of the group of students involved, the level of mastery 
required, and the objectives of the instructor, traditionally CRM items 
are relatively easy test items. Sometimes a criterion of 90* is 
specified. In this case, 90* of the students can be expected to anwer 
most items correct ly. 

Domain instruct ion. It is difficult to make generalizations 
about the domain of instruction that is measured by the two types of 
tests, but it is fairly safe to say that NRM has most often been used 
for measuring learning of the factual information and concepts that is 
usually referred to as the cognitive domain. While CRM can be readily 
used for measurement in the cognitive domain, the nature of CRM also 
makes it especially useful for measuring the learning of physical skills 
that are included in the psychomotor domain. 

Discrimination, NRM tests attempt to rank order groups of 
student f ^om high to low. fin NRM test is considered to be a good item 
if those who do well on the total test also do well on that item. Item 
analysis is a procedure through which a test constructor carefully 
evaluates each item to determine if the item discriminates between good 
and poor students. Items that do not have this quality are discarded 
and do not remain in the test. 

In CRM, the best items are those that indicate that a large 
percentage of examinees have mastered the instruction. Therefore, good 
, test items are found among those items that either show low or zero 
discrimination. It could be argued that the best educational situation 
occurs when everyone gets all of the items on a CRM test right. Thus, 
it might be an acceptable point of view to consider the best test items 
to be ones of zero discrimination. 

Rel iabi 1 itv . The reliability (the precision or accuracy of 
measurement) is a prime consideration for NRM. Mosts often, reliability 
estimates for NRM are obtained indirectly by correlational coefficients 
since reliability cannot be measured directly. Reliability is not 
considered to be such an overriding concern in CRM and most CRM 
instruments are constructed without much attention to reliability. NRM 
instruments are usually relatively long tests, since the degree of 
reliability is directly related to test length. Since reliability is 
not as important to most CRM constructors, CRM instruments are often 
shorter tests. It should be pointed out that. the reliability of the 
skills tests that were developed in this project were all very high. 
Almost all of the tests developed have had reliability coefficients of 
above . 90. 

Validity . There are many methods for determining the validity 
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of an NRM instrument- Content validity is the most frequent means of 
validity, determination for NRM achievement tests. Content validity 
attempts to demonstrate that the items covered on the test constitute a 
representative sample of the material covered during instruction. 
Since CRM i^ems are based on the skills specified in skills lists, 
curricular validity is used to determine the test content. Curricular 
validity is established by keying a series of test itoins to each of the 
skills in the skills list. 

Previously acqui red skills . In NRM, students niL st often use 
previously aL^uired skills to respond to items so that they may 
demonstrate the broad global understandings that are typical ly measured 
by NRM instruments. CRM usually focuses on the learning specified in 
the skills lists and consequently does not typically require the 
learners to integrate as much of their previously acquired skills into 
their test performance. 

Comparisons. NRM measures a student's performance in relation 
to that of the group *norm and also to that of each of the other 
students. CRM encourages competition with one's self to acquire 
proficiencies. CRM merely attempts to find what each student can and 
cannot do rather than attempting to find out who can do more of it than 
other students can do. The student's score is compared to the criterion 
rather than to the scores of other students. 

Distribution of fcrst scores . The distribution of test scores in 
NRM is, ideally, a normal distribution. The CRM distribution of scores 
consists of two rectangular distributions, passing and not passing. 
These distributions are illustrated in Figure 8. 



Figure 8 goes about here. 



The NRM distribution is appropriate for the purpose of NRM which 
is to order the group measured from high to low. The two rectangular 
distributions illustrate the function of CRM which is to separate 
students who have mastered skills from those who have not. 

Instruction related to the test . Instructors who teach to an 
NRM test try to maximize the amount of material covered. The goal of a 
teacher teaching to an NRM test is to provide through a complete survey 
of the field a thorough overview of the subject matter. Instructors 
teaching with the anticipation of a CRM test try to maximize the 
percentage of students who will master the skills. 

Score, Thf? score received in an NRM test is usually the number 
of items answered correctly or the percentage of correct responses. As 
previously indicated, the only score a student receives on a CRM test 
is either of two scores, pass or non-pass. 

Fufi^r ion, NRM measures the amount of knowledge individual 
students have learned by ranking them from high to low, CRM evaluates 
the effectiveness of instruction by determining how many of each class 
have mastered the skills. 



Advantages of NRM and CRM 
Articles have bu written which have declared NRM to be immoral 
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and have proclaimed CRM to be the only humane way to evaluaate 
students. The rationale for these articles appears to be that it is 
an inherent characteristic of NRM for half of the students to miss 
each item and for half of the students to fall below the norm. This 
approach does not serve to motivate students and consequently NRM 
fails to encourage the type of success that enhances motivation and 
learning. The critics of NRM also fault student vs. student 
competition and consider the competition of students with themselves 
or with a criterion to be healthier and non-destructive. 

Certainly the potent ial for evaluat ing instruct ion is greater 
in CRM than in NRM, because traditional NRM has never been concerned 
with the evaluation of instructional effectiveness or with the 
improvement of subsequent instruction. The NRM model has been one 
that has been preoccupied with aptitude, selection, prediction, and 
inference. The CRM model is concerned with evaluating and revising 
instruction. CRM can lead to more meaningful information than that 
provided with the NRM model when criteria are obvious and simple ones. 
The information provided concerning the mastery or non-mastery of 
skills is more useful for helping students who have specific learning 
difficulties. 



Method for School Improvement through Inservice Education 

Although this method of improving schools through inservice 
test construction has been tried in six school systems in rural 
Indiana and Illinois, it was developed most extensively in Rossville, 
Illinois and therefore is called the Rossville model. 

Inservice training is involved in the training of faculty in 
the testing model. After faculty are educated concerning thrj 
inadequacies of standardized testing, they ere very sensitive and 
accepting of a new model. Inservice workshops are provided for 
faculty to discuss and form drafts of the skills lists that will serve 
as the basis for determining the content of the CRM skills lists. 

A committee is then formed consisting of one teacher for each 
subject to be tested at each grade level. The grade level 
chairpersons utilize the Delphi method to confer with their fellow 
teachers. Each grade level representative i£ in charge of an informal 
committee for their assigned subject area and grade level. Teachers 
at each grade level confer informally and complete the list of skills 
that they agree students should have mastered at the end of each 
semester of instruct ion. 

The committee works informally with their grade level 
chairpersons to develop test items to test the skills. The committee 
of grade level chairpersons meets periodically with a testing 
consultant or a curriculum specialist to develop the skills tests. 
Figure 11 shows an organizational chart of the project. 
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Figure 9 goes about hure. 
It has been observed that many otherwise competent teachers do 
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not write good test items. Therefore, the test development phase of 
the project utilizes large itam pools. Rather than the items being 
•randomly generated by a computer, they are carefully selected and 
screened by the committee. Items are selected and screened to form 
semester competency tests in the basic skills for each grade level. 

The committee members are instructed to either select or write 
items in such a way that they would estimate thar at least 70S of the 
students who have been instructed in the skill would respond to at least 
70S of the items correctly. 

After the tests have been administered and scored, each 
unmastered skill will be analyzed to determine whether the test items, 
the skills lists, or the instruction needs to be improved for groups of 
students who will be learning at each grade level. 

Tests are scored by a mark sense reader and results are 
interpreted by a computer. After the results have been analyzed by the 
testing consultant, a summary of the findings is presented to the 
faculty and the school board. 

Characteristics of the CRM 

The Rossville model measures skills that teachers expect 
students to have mastered. fit least seventy percent of the students 
answer at least 70 percent of the items correctly. 

The Rossville Model possesses all of the advantages that were 
described in an earlier part of this report and also goes a long way to 
overcoming many of the disadvantages cited for NRM. 

Among specific characteristics this method possesses are: 

1. The tests are inexpensive and reijrcively easy to 
construct. 

S. The tests monitor student progress and provide a 
diagnosis-remediation approach to learning. 

3. The tests measure important basic skills. 

4. The testing design procedure involves the total * 
instruct ional staff. 

5. The same testing pattern is integrated into all grades. 

6. Tests are free from errors and contain clear and 
unambiguous items. 

7. There are scores for students on each skill and for the 
total test. 

8. There are grade summaries for each skill at each grade 
level. 



Advantages of the CRM Model 
Advantages for the CRM model over traditional standard iz&ci tests 
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are: 

Time factor. High quality tests can be developed *Vom start in 
a few months. 

'Skills lists. The skills lists serve to articulate the 
curriculum in the basic skills and to provide a basis for test 
development. 

Management system. There is a diagnostic-remediation feature 
- clinical approach to learning- that is not available in most other 
testing methods. 

Curriculum. The skills lists cause faculty to carefully 
examinea what they are trying to accomplish. In some school systems 
this has not been done in the past forty years. 

Professional appearance. When tests are printed by a 
commercial printer, they have a professional appearance. 

Item pool. The teachers-may-either -write. Items or work with 
previously written items fv*om a large item pool. 

Scoring. Results are obtained by scoring with mark sense 
equipment and analysis by computer. 

Test analysis. The tests and individual items on each test are 
constantly monitored by a. computerized item analysis. 

Revisions. Where test results indicate, items are revised to 
correct any editorial or statistical deficiencies. 



Summary 

f 

This paper has described the logic and the procedures utilized 
in an inservice approrch to curriculum and school improvement through 
the development of local tests in the basic skills. The approach has 
been attempted at six schools. Teachers at all of the schools are 
enthusiastic about the instructional advantages of this testing model 
over traditional testing. None of the schools has discontinued using 
i to this date. Teachers prefer the tests to traditional norm 
referenced tests and find that the information the tests provides 
assists them to work with individual students. 

Future developments that are planned are to expand the testing 
method to other subjects than the basic skills and to computerize the 
entire testing process so that students can have tests scored ^and 
interpreted while working at a microcomputer. 

Although thia is a method which requires much work on the part 
of the teachers and consultants and much cooperation from the school 
administration, it is a method which has proven to eliminate many of 
the inadequacies and much of the unfairness that have been associated 
with standardized norm referenced tests. 
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'TPS SKILLS SCORE MASTERY 

1-5 Identifies value. of all U.S. coins. 

6-10 Interprets graph data. 

11-15 Knows days of week and months of year in 

sequence. •' 

16-20 Identifies operation needed (addition or 

subtraction) in story problems. • 

21-25 Writes time gorre30T to half-hour. 

26-30 Knows .'waning of 'halves, thirds, and fourths. • 

31-35 l%ASURES TO h INCH. 

36-40 Measures to nearest centimeter. 

41-45 Understands number families. 

. 46-50 Knows multiplication facts 1's, 2's, 3's, 5's, and 

IlTs. 
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Choose the best answer: 
1. 




4 quarters = 

a) 90* 

b) 100t 



c) 60* 

d) 75* 



2. A penny is 

a) $0.01 

b) $0.05 




c) $0.10 

d) $0^.25 



3. A nickel is 

a) $0.01 

b) $0.05 




c) $Q.io 

d) $0.25 



4. A quarter is 

a) H 

b) - 5* 




c) 10* 

d) 25* 



5. A dime is 

a) .1* 

b) 5* 




c) 10* 

d) 25* 
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Answer items 6-8 using this graph. 

NUMBER OF HOURS DAVID FISHED 




6. David- fished* 

a) * 

b) - 5 



hours on Thursday. 

c) 6 

d) 8 



7. David fished the shortest time on _ 

a) Monday c) Wednesday 

b) Tuesday d) Thursday 



8. David fished the longest on ____ 

a) Monday c) Thursday 

b) Tuesday : : d) Friday 
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Answer items 9 and 10 using this graph. 



BOXES OF COOKIES SOLD 



SALLY 



RUTH 



TERRY 



CONNIE 




4 



O 10 20 30 40 50 60 



9« Who sold about 45 cookies? 

a) Connie b) Ruth c) Terry 



10* About bow 'many cookies did Connie sell? 
a) 15 . b) 25 c) 35 
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Look at the calendar to answer these questions. 
11. What day of the week is the fifth of May? 



a) Monday 

b) Tuesday 



c) Wednesday 

d) Thursday 



Which is rnissing? 
14* Tuesday 



a) Wednesday 

b) Friday 



Thursday 

c) Monday 

d) Saturday 



Which is missing? 
15* February 

January 
March 



April 

5] 



December 
July 



For questions 16 - 20,. read the story 
problem, and eheesecthe^ceFree^asswegf 



16. Mary had 10 balloons; 
.3 broke. 

How many balloons were left? 



a) 7+3=10 

b) 10-3=7 



c) 10-7=3 

d) 3+7=10 



17. There were 14 cats and 22 dogs in the | 
show. How many cats and dogs in all were 
in the pet show 2 



a) 36-22=14 

b) 36+22=14 



c) . 14+22=36 

d) -'36-14=22 



18.' Bill had 20 cars. 
■ He lost 8. cars.. 

How many did he have left? 



a) 20-8=12 

b) 12+8=20 



c) 8+12=20 

d) 20-12=8 



12. What day does May begin on? 



a) Saturday 

b) Sunday 



1 Monday 
) Friday 



19. Steve had 14 apples. 
Dave had 5 apples. 

How many more apples did Steve have thi 
David? 



Which is missing? 
13. June August 



a) 5+9=14 

b) 14-5=9 



c) 14-9=5 

d) 9+5=14 



May 
April 



c) September 

d) July 
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20. Polly found 13 butterflies. 
John found 10 butterflies. 
How many more did Polly find than John? 



a) 13-3=10 

b) 10+3=13 



c) 3+10=13 

d) 13-10=3 



Choose the correct time. 
21. 

a) 3:15 

b) 10:00 




c) 10:15 

d) 10:30 



22. 

a) 6:15 

b) 2:30 




c) 6:30 

d) 2:60 



23. 

a) 12:00 

b) 6:00 




c) 5:45 

d) 12:50 



24, 

a) - 6:00 

b) 4:30 




• c) 9:00 
4 d) 12:00 
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25. 

a) 3:00 

b) 4:00 




c) 9:00 

d) 10:00 



Choose the correct answer. 

26. Which circle is divided into fourths? 



a) 



b) 



c) 



d) 



0@©Q 



27, Which square is divided into thirds. 



a) 



b) 



c) 



d) 




28. How much is shaded? 




a) 1/8 

b) 1/4 



c) 1/3 

d) 1/2 
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29. 'How much is shaded? 




a) 1/8 

b) 1/4 



c) 1/3 

d) 1/2 



30. Which part of the object is. shaded? 




a) 1/2 

b) 1/3. 



c) 1/4 

d) 1/5 



Choose the correct measurement. 



31. 




'. ! 1 I 1 1 1 i ' I 1 I" 

I 2 3 

inches 



a) 2 inches b) Zh inches c) 3 inches. 



33. 




P) 



i — ' — r 



inches 



T • i 1 i • r ~ r l 

i i 3 



a) \H inches b) 2 inches c) 2^- in'. 



34. 



1 '. • t ' 1 ' I ' 1 ' I 

I 2 . 3 

inches 



a) 1h inches b) ZH inches c) 2 in 



35. 



2 



inches 



a) 1% inches b) 2 inches c) Zh h 



32. 



c 



Circle the correct measurements in centimete 
36. 



T — I — ! — | i t • I t 
1 



• 3 




inches 



^ j i n rr-^ — r — r 

c^V 3 3, *\f 1 



a) 1 inch b) 2 inches c) 1*5 inches 



a) 3 centimeters b) 2 centimeters 

c) 5 centimeters 
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7 * • • v ^ « — l 



a) 2 centimeters b) 3 centimeters 

c) 4 centimeters 



38. 




~l 1 7X1 ^ IT 



a) 3 centimeters b) 5 centimeters 

c) 4 centimeters 



39. 



TV p r 

3, ^* 



a) 3* centimeters b) 2 centimeters 

c) 4 centimeters 



40. 



—i • ^ n r^-| 

/. 2 .3 , ^.1 



a) 3 centimeters 



b)/2 centimeters 



c) 4 centimeters 
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41. Choose the number family for the numbers 
(8, 2, 10) 



a) 8+8=16 
8+2=10 
2+8=10 
10-8= 2 



b) 10-5=5 
10-2=8 
8+2=10 



2+8=10 



c) 8+2=10 
2+8=10 
10-8=2 
10-2=8 



42. ' Choose the number family for the numbers 
(8, 9, 17) 



a) 8+9=17 
• 9+8=17 
17-9=8 
17-8=9 



b) 17-6=11 
8+9=17 
9+8=17 
17-9=8 



c) 8+7=15 
■ 8+9=17 
17-8=9 
17-9=8 



43. Choose the number family for .the number* 
(3, 9, 12) 



a) 



3+3=6 
9+3=12 
3+9=12 
12-9=3 



b) 



12-9=3 
12-3=9 
9+3=12 
3+9=12 



c) 



12-6=6 
3+9=12 

12-3=9 
9+3=12 



44. Choose the number family for the number: 
- (15, 4, 11) 



a) 



15-4=11 
11+4=15 
15-11= 4 
4+11=15 



b) 15-5=10 
15-11=4 
15-4=11 
11+4=15 



c) 4+11=15 
4+7=11 
15-4=11 
15-11=4 
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45. Choose the number family for the numbers 
(9, 5, 14) 

a) 14-9=5 b) 9+5=14 

14-5=9 14-5=8 

5+9=14 14-9=5 

9+5=14 5+9=14 



c) 9+5=14 
4+9=13 
14-9=5 
14-5=9 



CHOOSE THE CORRECT ANSWER 

46. 3X2= 

a) 5 b) 6 c) 10 

d) 1 



47. 6X1= 

a) 24 b) 22 c) 

d) 6 



48. 8X5= 

a) 32 b) 13 

d) 18 



c). 40 



49. 7X3= 

a) 13 b) 42 

d) 21 



c) 35 



50. 9X8= 
a) 72 



b) 17 
d) 64 



c) 42 



END 



ERIC 



page 6 
Revised 



So 



M 3-2 
1983-84 



